Softmax function

Interpretation

Attention Is Off By One elucidates what softmax function does in the context of the Transformer model.

Usages

Variations